Region Dependent Transform on MLP Features for Speech Recognition

نویسندگان

Tim Ng

Bing Zhang

Spyridon Matsoukas

Long Nguyen

چکیده

In this work, Region Dependent Transform (RDT) is used as a feature extraction process to combine the traditional short-term acoustic features with the features derived from Multi-Layer Perceptrons (MLP) which is trained from the long-term features. When compared to the conventional feature augmentation approach, substantial improvement is obtained. Moreover, an improved RDT training procedure in which speaker dependent transforms are take into account is proposed for feature combinination in the Speaker Adaptive Training. By incorporating the higher dimensional features output from the layer prior to the bottleneck layer into our Speech-to-Text (STT) system using RDT, significant improvement is achieved as compared to using the conventional bottleneck features. In summary, by using the features derived from MLP with RDT, 8.2% to 11.4% relative reduction in Character Error Rate is achieved for our Mandarin STT systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker independent phoneme recognition by MLP using wavelet features

Feature extraction is one of the most important tasks in speech recognition system. Most of the speech recognition systems use Short Time Fourier Transform (STFT) for the derivation of features from the spoken utterances. In this paper we try to exploit the higher time–frequency resolution property of Discrete Wavelet Transform (DWT) for extraction of speaker independent features. The features ...

متن کامل

Jointly optimized discriminative features for speech recognition

In the past decade, methods to extract long-term acoustic features for speech recognition using Multi-Layer Perceptrons have been proposed. These features have been proved to be good complementary features in some feature augmentations and/or through system combination. Usually, conventional linear dimension reduction algorithms, e.g. Linear Discriminative Analysis, are not applied on the combi...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?

Gaussian Mixture Model (GMM) and Multi Layer Perceptron (MLP) based acoustic models are compared on a French large vocabulary continuous speech recognition (LVCSR) task. In addition to optimizing the output layer size of the MLP, the effect of the deep neural network structure is also investigated. Moreover, using different linear transformations (time derivatives, LDA, CMLLR) on conventional M...

متن کامل

On using MLP features in LVCSR

One of the major research thrusts in the speech group at ICSI is to use Multi-Layer Perceptron (MLP) based features in automatic speech recognition (ASR). This paper presents a study of three aspects of this effort: 1) the properties of the MLP features which make them useful, 2) incorporating MLP features together with PLP features in ASR, and 3) possible redundancy between MLP features and mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Region Dependent Transform on MLP Features for Speech Recognition

نویسندگان

چکیده

منابع مشابه

Speaker independent phoneme recognition by MLP using wavelet features

Jointly optimized discriminative features for speech recognition

Speech Emotion Recognition Using Scalogram Based Deep Structure

Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?

On using MLP features in LVCSR

عنوان ژورنال:

اشتراک گذاری